Introduction
White-box Attacks
In white-box adversarial attacks, attackers have complete knowledge of
the target model, including model structure, weight parameters, and training data. In this scenario,
attackers can directly access the internal information of the target model, making it easier to
understand the model’s characteristics and vulnerabilities. Attackers can generate adversarial
examples in a targeted manner by analyzing model gradients, loss functions, and other information,
causing the model to produce misleading outputs. White-box attacks typically involve using gradient
information for backpropagation to maximize changes in input, steering the model output toward
the direction expected by the attacker. With carefully designed adversarial examples, attackers
can guide the model to make incorrect decisions, posing significant harm in practical applications
Black-box Attacks
In recent years, black-box adversarial attacks in the field of neural code
models have been widely studied. In contrast to white-box adversarial attacks, where the attacker
has detailed information about the model’s structure and weights, black-box adversarial attacks
involve attackers who cannot access such detailed information. In black-box attacks, adversaries
can only generate adversarial examples by obtaining limited model outputs through model queries.
The harm caused by black-box attacks primarily manifests in compromised model performance and
threats to system security. In situations where detailed model information is unavailable, attackers
ingeniously construct adversarial examples, potentially leading to misleading outputs from neural
code models, affecting the accuracy of the model in practical tasks. This not only poses a potential
threat to downstream models in software engineering tasks but may also result in serious issues in
security-critical systems.